OOCOHEIil &BSOH£ 



SO 187 f27 

AUXUOIi 
TITLE 

iNSTIlOTION 

SBOUS k-SZHCt 

B£POfiT HQ 
PUB DATE 
CONIfiACI 
NOTE : 

EDaS ihiCE 
OfcSCEIPxOBS 



TH BOO 222 



i;avia, Jane L. 

Local Uses of Title X' Evaluations, 
Stanford fiitssaxcu Xnst.^ flenlo Par)i» 
Eiiucationax' Policy Research Center, 
Office of Education, (DHEH) , 
oc Planning, Budgeting, and 
EifiC 21 
Jul 7d 
100-77-00^5 
55p. 



Calif, 



Washington,- 
Evaluation. 



D.c. Office 



IDENTIflEttS 



'ABSTRACT 



HfOl/PCOi Plus Postage. 

♦Achievement Tests: Compensatory Education; Decision 
flaking; t<ducationai History; Elementary Secoiidary 
Education; Evaluation Needs; Informal Assessment; 
<Informatioa Utilization; Program Attitudes; Program 
Effectiveness; *Program Evaluation; Program 
Improvement; *School Districts; Standardized Tests 
•♦Elementary Secondary Education Act 'Title I; 
♦iietaev&xuation 



A survey oi administrators, teachers, and parents in 
15 Elwaentary Secondary Education Act Title I school districts 
indicated that evaluations are used legally to meet federal and state 
re<iairements, to inform parents and staff, and to confirm positive 
attitudes toward a program. Standardized achievement tests, the 
backbone of Title I evaluations, are viewed as inadequate for judging 
or improving programs. Responaents felt tests were biased and 
pr.eferred other measures: skill-specific tests, observation, 
self-concept or attitude measures. Evaluations were not used to 
improve programs ft>r several reasons: program stability: fulids; ^ 
politics: unavailability of resultu in time for decision making; 
difteriag information needs of federal, local, and state agencies. 
Efcspondeiits aisliKed evaluation and ignored negative results if they 
believed m a program. Until tnese underlying attitudes change, 
evaluation results, even if technically sound, will not.be used. Of 
the two current federal . strat*egies for Title 1 evaluation, an 
indtii^endent national study ana a local-to-state-to- federal reporting 
schwrne, the latter is moiki liKtly to be used. The federal government 
shoula be committea to increasing communication between program staff 
and evaludtion iStafi, and to t^ioaoting evaluation among legal staff. 
(CP) V • 




* fveproductions supplied by LDhS are the best that can b'e made *• 

* from tne origihal document. * 
************************%***** *m* ************* ***i***m* *********** ****** 



LOCAL USES OF 
TITLE I EVALUATIONS 



July 1978 



RcsMTch .Itport EPRC 21 




^y: Jane L David • 

Senior Policy Analyst 

Prepared for; 

Office of the Aaalatant Secretary for Planning and Evaluation 

^ Department of Health, Education, and Welliire 
. Washington, D.C, 20201 . 



, Contract HEW-1 00-77-0095 



SniProjectURU-6854 




^ . EXECmiVE SUMMARY 

^.^^■^>'-qLt-L"t of t he lyro hlem 

Title I of the Elementary and Secoudary Education Act of 1965 (ESEA) 
was the first major social legislation to require program evaluation. The 
original requirement for Title I evaluations ard Us subsequent elabora- 
tion in the 1974 Amendments .to the Act have resulted In a variety of 
iuterpretations of the purposes of tjie evaluations and aeveral Federal 
strategies for their conduct. Since 1965, the Federal' strategies for 
Title I evaluation adopted by the United Sta4:es Office of Education (USOE)" 
emphasize Federal information needs. By contrast, the legislative history 
of ESKA reflects a strung Congressional interest in the provision of eval- 
uation Information that is also useful for program improvement at the 
local level. The extent to which Title I evaluations have met Federal 
infoimatlon needs has been, studied, but there has been little attention 
paid to the impac: of Federally mandated evaluations at the local level. 
This study was designed to investigate whether the same evaluation system 
can sorve both local and Federal n^eds through an examination of local 
uses of Title I evaluation. 

f 

4 ' " 

Object Ivos 

t This study was'dcsiMned to answer two major questions: Do local staff 
use Title 1 evaluation results to IdeffTlfy strengths and weaknesses of 
their programs in order to improve them? Are the recent and proposed 
changes In. the Title 1 evaluation system likely to alter local' use of. 
evaluation.' Sp'ec i f ical ly , the study investigated how local Title I staff 
a.ul parents use their Title I evaluation, what information they use in 
judging the effectiveness of theip program, and how they make decisions 
about chjiaging the program. The objective was to produce a report to 
document the history of Federal strategies in Title I evaluation, the 
uses of Title I evaluations by local staff and parents, the other types 



ill 



ot Information used by local staff and pa*rents in judging and in improving 
their program^ and the implications of these findings for the current 
Federal Title 1 evaluation strategy. 

« 

* 

Methodo lpgy , 

The primary samph* consists of 15 Title I districts in six states. ^ 
The districts were selected among those reputed to have an above-average 
emphasis on or concern with, evaluation. The identification of such dis- 
tricts was based on recommendations of USOE staff*^ Technical Assistance 
Center directors, and state Title I directors. Although the sample is 
not nationally representative , choosing districts especially concerned 
with evaluation ensures that the' findings are based on situations with the 
greatest potential for use of evaluations. In adijition» the sample was 
augmented by field notes from another 15 districts collected in a concurrent 
lISOF>f unded study that involved interviews concerning evaluation in Title I 
districts. 

4 

The data collection consisted of face-to-face interviews' with Title I 
administrators, principals of Title I schools. Title I teaching staff, and 
parents of Title I students. Copies of evaluation reports and other related 
documents were also obtained. A district visit was made by one or two 
Inter v^icwers for one to two days. The interviews were structured, to the 
extent Lnat the' same topics were pursued in each interview, but the emphasis 
.Ml each topic and the specific questions were tailored to each situation 
• 'xl^ respondent . 

The analysis consisted of drawing a tentative set of generalizations 
\XsK\ si'veral readings of the field notes. For each generalisation, the 
mUes were v;oue through carefully, extracting evidence in support of 
and opp<»Si'd to the K^^nerA I i za t ions . After refining the genferal statements 

be ri^piM'ted, qut^tatlons illustrating each point were pulled from the 
•notes. I'Tx^n) these lists, examples wetUi_se lected for inclusion in the 
repi^rt , thus easuritij; that the quotations reported in the text are indeed 
rt'pri stMitat ive of tlie responses. ' 




Posttest or gain scores reported fot each project on standardized' 
achievement tests comprise the main pa'rt of the district Title I evalua- 
tion for all the districts visited. Therefore, the findings often indl- 
cate uses of and attitudes towards standardized achievement tests rather 
than the evaluation report per se. 

In general, -the primary function the evaluations serve is to me'et . 
state .and Federal reporting requirements; districts" fl?id that employing* 
standardized achievement testa is the simplest way to meet these require^ 
me..ts. In addition, the evaluation is used to provide feedback to school 
staff and parents, 'often consisting simply <.f the provislon^of eummar.^ ,s 
of results to these audiences. Finally, respondents claimed that the 
evaluation "report serves as an indicator of success, as a source of con- 
firmation of existing beliefs about the program, and as a public relations 
document (however, these uses occur only when the results are positive). 

From the responses to the direct question of how the evaluation"^ 
r'esults are used, it is clear that they do not primarily serve either as r 
a means of judging the program or as a guide to program improvement. We 

pursued this issue in mo^c^depth by asking respondents how they judge 

^programs anu how program decision^ are made. 

From asking respondents how "they would demonstrate that their programs 
..ro successful, and how they would make judgments about other programs, 
it is possible to deduce why evalimtlon plays such a limited role in 
these jud>.n,ents. Fir.st. when lojir^ staff weigh standardized test results 
against other sources of Informalion, such as skills-related tests and 
personal j.idgment or observation.! the other sources of information 
.Umost alwavs carry more weight. Isecond. a frequent explanation for 
i.^noring evaluation results is thlt the scores are not meaningful because 
important background characteristics of schools (e.g.. mobility) and 
children (o.g., socioeconomic st/tus) have not been considered. Finally. 
In the eyes of staff and parents, the evaluation often excludes measure- 
ment of ,,,als that they feel are as Important as achievement, if not more. 
When asked what <,ther types of information they would like for judging 




programs^ staff and parents typically cite^i curriculum-embedded and, other 
skills tests rather than standardized tests » measures of noncognitive 
domains such as 84^3 If --concept and attitude^ and measures of program impact • 
on part>nt'S» Lhe community^ and the staff itself* 

« • 

As with judgments of programs^ evaluation data are rarely mentioned 
In the context of decisions about program changes. Responses to questions 
about how program changes are decided suggest several reasons for this 
finding. One reason is that programs are quite stable; the changes that 
do occur tend to be marginal. Thus the universe in which to find connec- 
tions between pfrogram change and evaluation's limited. A second reason 
is th.it the results of the evaluations are often not available in time 
for use in pl. inning. A third reason is that factors other than evaluation^ 
such as availability of funds and political concerns^ play a major role 
in program decision-making. Finally^ for the same reasons that evalua- 
Lions are of ten' ignored in Judging program effectiveness- (preference fot 
other types of cognitive measures, belief in personal impressions^ and 
ciMicern with other outcomes)\ they are ignored in program planning. Thfere 
arc c? few examples of changes in pro|^rams that were motivated in part by 
es/aluation results^ but these are the exception rather than the rule. ^ 

9 

Stated reasons for not using-^ evaluations tend to focus on the charac- 
t eristics of the Information they contain^ and hence imply that if the 
tvpo o\ inf i^rni.it ion were changed, lise of evaluation would increase. A 
vMrcful consideration of respondents' statements as a whole, however, 
rXct^kU'-'^ts ethiVwisc. There are constraints on evaluation use imposed by 
tlu» structurt* o\ Title I programs «is well as unstaced reasons for not 
usiu).: rvaluation results, both of whicli must be understood in order to 
dcifriiuiie tM'fective ways of Increasing Uu'al use of evaluation.. 

Two ot t1u' constr.ilnts' imposed by the structure of Title I progra|ns 
h.ivi .ilrt-.ulv been mentioned: the stability of programs and the timing of 
t»v.i I u.it ivns . Some other constraints were also observed, if not stated 
vlireetlv hv resptnuieiits. First, in almost every district there is I i 1 1 le 
evMiiieet iiMi hetwetMi program staff and evaluation staff; this is particularly 
true ill dislriv-ts that iise external t»va 1 u.itors. Consequently, there is 



often Little communication and understanding between those responsible 
tor the administration and content of the program, on the one hand, and 
those responsible for the desiga and conduct of the evaluation on the 
r other. Seoond, e^ery Tide 1 program contains multiple audiences with 
ditterent inrormayion needs which are often overlooked in the design and 
reporting of evaluations. Finally, there is the- general constraint 
imposed b.y the state of the art in educational treatments. Thus, deficits 
in kuowled^e about what constitutes a successful strategy in education 
limit tlie extent to which evaluations can be fully utilized. This con- 
.ur.i'int reflects not only the lack of proven alternatives, but also the 
iruHLration that the lack produces. 

Beyond these contextual constraints, there are two attitudes of 
linle 1 statf that limit evaluation use. The first is that evaluation 
is usually perceived in a na.rrow and potentially threatening way. Evalua- 
tion is typ.cally viewed as a se^ of procedures to provide one's superiors 
wiJ:.h infornuition on which to judge the program, on the basis of criteria 
defined by those s^iperiors. Hence, evaluation's more likely to be asso- 
dialed with accountability than to be regarded as a potential source of 
useful inroVmatlon. The second is that most Title I staff are deeply 
committed to the pro^^ram; accordingly, they seek out evidence in support 
^'f their positive feelings toward the program and effectively ignore 
evidfiu e that does not .Support, these feelings. 

These o'.servatlons lead to the conclusion that changing the type or 
qii.ilitv ot i.uormation contained in Title I evaluations will not, by it- 
s.lt, sii'.airi^antLy affect local use of these evaluations. To achieve 
•n inrrvase iu local use, an evaluation system must attack the factors 
'.naerlv'-n>; ilie lack of use, both the elements of the program that act 
'is r>Mistraipts in themselves and the individual beliefs attitudes that 
pr>'duie a ner.ative view of ov.iluatioa. 

KiN omnieiuia t ions 
I 

rstii: is currently employing two evaluation strategies. The first 
is a massive, multiyear study conducted by an independent contractor and 
vU-si>-.utsi to provide a nation.il picture of the impact of Title 1 on 

vil 



achievement. The second Is the implementation of evaluation models withiu 
the tLree-tlered (local to state to Federal) reporting 'system designed to 
improve the quality and comparability of locally collected data. The ^ 
N second ettort includes the provision of technical 'assistance from centers 
esLablisht'd in each of the ten HEW regions for this purpose. It is gen- 
erally agreed that independent national studies provide the best source 
of evidenie for the national impact of Title I. Because the best place 
to i.vmsUU^r ways of increasing Local use would seem to be Within the 
lhroc>'-t lerod reporting scheme, my reccmmendations refer to chis scheme. 

First » any strategy designed to increase local use of Title 1 eval-- 
uat Ions must be grounded in a Federal commitment to this goal— a commit-^ 
mvni that must be understood and shared by the states and communicated 
^•ItMrly to Local districts. . • ■'. 

St'cond. districts need assistance in increasing communication and 
looporat io'n between program staff and evaluation^ staff . The site visits 
suggest that the provision of feedback can be used as one way -to faclli- 
tatf anderstanding between program and evaluation staff. However, the 
intormation ted back must be designed to be clearly understood by st^ff 
and parents and must meet the different needs of different levels within 
a U is trio, t. 

Third, Title 1 staff and parents need assistance in developing an 
luukTSLandinK of the constructive role thdt evaluation can play as well 
as vtTtain types of evaluation skills. Local staff have received little, 
i! any, training in incorporating evaluation information into planning 
and vlovision making. In particular, they tt^ed assistance in learning 
U>.w t.^ ask LUoir own evaluation questions. If the primary purpose of 
.•valuaii.ru ;fmains that of answering questions imposed externally, eval- 
nativ^ns will v'ontinue to be perceived as potentially moie threatening 
t han hel p t ul . 

Until loval staft view evaluation in a positive light, effort de- 
voted exclusively Lo the development of technically sound data will be 
wasLc-l in the" context •.)f local use. The USOK evaluat).on models are de- 
sir.ned t.^ improve the quality of the data and will not. by themselves. 
Ir.id to A<x increase in loeal use ot evaluation. However, the current 

vi i i 



4* 



technical assistance strategy, if red ir^icted, could serve as a powerful 
torc^ in changing how evaluation is perceived and thereby increase eval- 
uation use locally. To accomplish this goal, technical assistance must 
be redesigned to c6mmunicate a new view of the role of evaluation and to 
develop skills such as generating one'e own evaluation questions. As 
louK as ttohnlcal assistance is defined narrowly as a way of telling 
local staff "how to improve the quality of their data " It will not 
inrroase lot-al use of evaluations. 



/ 



ERIC - 



ix 



ACKNOWLEDGMENTS 

The quali-y ot^any study based on personal interviews is dependent 
upon the cooperation and candor of the respondents. In this study, my 
•sLai't and I were exceedingly fortunate to meet with individuals who not 
only responded willingly and openly to questions, but went out of their 
way to arrange complicated schedules, to provide information and to 
i;rai iously play host to strangers. I owe tremendous gratitude to the 
'many people at the local, st^te and Federal level, whose contributions 
ii»n:i the backbone Of this study. 

I am also indebted' to Susan Peterson and Henry Acland who conducted 
the Interviews along with me. Their sensitivity aud insights provided 
superb. tield notes from which to work and contributed substantially to 
the analysis. Henry Acland is due thanks as well for sharing with me 
the time-consuming task of extracting and classifying the quotations and 
reacting to embiyonic stages of the report. 

Sigiiif leant improvements from the draft to the final report were 
made posi^ible^by the thoughtful comments of many reviewers and especially 
by the literary skills of David Creene. Finally, Keith Baker of the 
oftiie of the Assistant Secretary for -Planning and Evaluation (DHEW) 
served as^ a model project officer providing valuable advice throughout 
the project on its design. Implementation and particularly on the form 
and substanv.' of tiie final report. 

I am ;ieeply appreciative of the assistance from all these people 
but cannot hold any of them responsible for the final product. The 
interpretations and conclusions .ire mine and do not necessarily reflect 
the views of either SRI. Incernat lona I or the Office of the Assistant 
Secretary tor I'lannlng and Evaluation. . 



xi 



ERIC 



1. 



CONTENTS 



KXKCUTIVE SLm\RY 

/XCKNOWI^KDGMENTS 

I [.NTRODUCTiON AWD BACKGRcarND , 

History 

Design ot .the Study 

Sample 

Interviews ' , 

Analys is 

t)rganii:at ion of Report: 

IJ PRIM^XRY USES OF EVALU/\T10N 

Meeting Requirements , 

Feedback of Results to Staff and^ Parents 

(iross Index of Program ,Ef fectiveness 

Ill USK OF EV.\LUATION IN JUDGING PROGRAMS 

l:i\nics of Evaluation in Judging Programs 

Data Not Considered Persuasive 

Import .:ai: Variables Omitted 

lmport.ant (Joals Not Measured 

What Information Is Used or Desired in Judging Programs 

('ognitive Growth- 

Nonoogn I live Outconvs 

Area.s Not Related to the Child 

IV I'si: iM- KVAi.UAruhy for progr^\m improvement 

l.inUs ot" KvaluatitMi for Program Improvement 

Pros'.ram Stability ; 

. Irrelevani'e of Kvaluatior^s . . *. r . . 

1 aar4*)ropr iattMiess of Evaluations 

»vl;at ryj>es ot I nf t>rmrit ion are Used in Making Decisions . . 

V DISCrsSIiJN • • . . 

I lU f rpre ta t ion o f thi' F i nd inv'^s 

Ihc (Context of 'lltle I programs 

I'tuier ly ln>.\ Attitudes Toward Kval nation 

Cowr 1 us ion 

Impl iv at: ions for Pol icy 

RI.FKKKNCKS 



xiii 



. ^ i INTRODUCTION AND- BACKGROUND 

Title I of the Elementary and Secondary Education Act of 1965 (ESEA) 
was the first major social legislation tc mandate evaluation/ The legist 
lat ive m.mdaU' was. vague, reflecting a compfomts* between Robert Kennedy 
who tavored Leal .accountability, and educational interest groups, who 
f.oar^ it. This mandate has resulted in a 13-year history marked by con- 
fusion and disagreement over the purposes o£ evaluation and the program, 
its.lt. Senator Kennedy's original notion was that evaluation would provide 
/P..rent« and communities with information that could be used to press for 
retorm. Thus, the original motive behind the evaluation requirements was 
a concern with the use of evaluation at the local l^vel. 



S i n i' t' 



that time, numerous interpretations of the use -of Title I* 
evaluations hav/e been put forth including a determination of the impact 
of Title I nationally, identification of successful programs, and the 
provision oi information to local staff for improving their programs. ' 
•nu. historv of the Title I evaluation strategies adopted by the Federal 
>;ovurnment reflects an almost exclusive concern with Federal information 
noeds. At the same time, the legislatiive history surrounding ESEA con- 
tinues to rofkut.the view -that Title I evaluations should also provide., 
iniorination u.setul at the local level in improving prpgrams. 

Can tho same evaluation system serve both 'local and Federal needs? 
liu' oxtent to which Title I evaluations have met Federal information needs- 
iKis he.n studied (Mel.aughl in , 1975); but litt le. attent ion h^as been paid 
to the impa.t of Federally mandated eva Luat ionk at the local level— par- 
ti^ti.l.irlv in terms of their utility in- providing information that can 
.v'uide program improvement. Therefore, this study was undertaken to look 
spe.-iti.allv at local uses of evaluation. Do local staff use evaluation 
re.su^lts to identify strengths and weaknesses of their programs in order 
to improve them? Are the proposed changes in the Title I evaluation system 
likelv to ,iU.T looal utilisation of results? This study was designed to 
iddress Ltiese l)asii- ciuestions. 

1 



To set the stage for^ the design and findings of th^study^ it is 
helpful tc review brief IV the history of Federal strategies adopted for 
conducting Title I evaluations and to interpret their intent and success 
In meeting Federal and local in^formatlon needs. 

• » 

Hlstor;^^ 

SlncV Che original legislation in 1965, each local educational agency 
(i.F.A) annually prepares an evaluation report for Its state educational 
agency (SKA), Each. state In turn compiles the results of the LEA reports" 
and produces an annual evaluation report for the United States Office of 
Kducation (USi>E). 'The first two years of this three-tiered reporting 
system were a major disappointment insofar as they did not produce con- 
sistent or comparable^ data that could be aggregated ,to provide a national 
picture of the effectiveness of Title I.- As a result, while US0E was 
urged to improve the -system and required by Congress (in a 1367 amendment) 
to report to them annually on the effectiveness of the programs, a somer 
what difterent approach was instigated by the Office of the Assistant 
Secretary tor Planning and Evaluation (ASPE) . This approach can.be 
characterized by its reliance on locally collected data, its production 
turu-tion approach to program effects, and its' goal which was to determine 
tor nati;.MUiI purposes the elements of successful programs. The stu^y, 
TKMPii (tiamed alter the *d i vis Ion of The General Electric Company that con- 
ducted I lie study), was an acknowledged failure. The failure was un- 
expected because neither Federal .of f iclals nor researchers had faced the 
complexity (^t measuring program characteristics and costs, arid because the. 
viesired acluevement data were impossible Lo obtain.^ 

PuriUf: the next two years* (1967-68 and 1968-69), USOE launched a 
thitvi apprv\ich--an annual .iiail survey designed to obtain Information on 
I r»^cj\im V iiaracteri st ics, participant characteristics, and the achievement 
parL icjpaut s. Ag,iin, however, the etfort to describe the national 

« • 

w . , . .... 

V 

\\w ile-ic r i pt ions ot the Federal strategies through 1970 are based in 

larc.e pari on in lOrm^it ion given by Mi-.Laughlin (1975) • 



lm,M> t lltl.._l on achl<,ven»!nt was thwarted by the absence of usable 
.u-hl..v..m.„t data, only'5% of the survey responses Included achievement. 
^ Uala liia^ cuulu be analyzed. 

AltnuuKh the Title I evaluation efforts are described above as 
Itirt... approaches, tht-y represent essentially, the same strategy. They 
were all designed to provide a nat ional 'p Icture of the impabt of Title I 
••'n pupil achievement and they all rested on secondary' analyses of locally 
^•oUe.-ted data. They also shared the same fate in that they wer^ all ' 
Vo.Midered failures in producing the desired information. The failures 
were attributed primarily to the ^.adequacies o'f locally collected data 

p.nifuvses ot- national aggregation; their value at the local level was 
not a .•nithject of serious investigation, 

Intil iV71, the only exceptlon-to the above generalizations was a 
Her.\.s oi studies begun in 1968 and continued through 1971— the "It 

. WvM-ks" series. Since the previous studies had not produced evidence of , 
I.ir..;e achievement gains attributable tD Title I nationally, there was a 
P-liii;al need at the Federal leve 1 to .demonst rate the success of Title I 
in raising- achievement. Therefore, USOE commissioned the American 
lu.^t itutes U.r- Research to conduct a search for exemplary programs-pro- 
;.-.ra.rs with" evaluations showing substantial gains in achievement, ' The * 

^rv.uU was the identification of approximately 30 such programs, although 
Iat<.r studie.s tound that, m;iny of these projects either no longer existed 
'aili-d to demonstrate effectiveness, 

tn IMTi. rsDK adopted a new strategy in addition to the three-ticrcd 
in;,'s-.sten,. Thev began to collect primary d^ta directly, instead 
i:>.-. on data, collected by I.KAs, The Compensatory Reading Study, 
••'■"..In. tc.l bv f.ducational Testing Service, was designed primarily to 
^Icscribc practices in compensatory reading programs, to assess their'* 
c! rt-c! i veiicss in terms of achievement and 'their costs. The results of ' 
tiic ;onp..r,.s.,torv Heading Study showed little rel^iL i onshi p between program 
••'•"•t i. ipat io„ and achievement and, like the former studies, was designed 

!-'-o.hu.- a national picture, not to meet' local information needs. Begun 
in 1^*1, the\tudy wis not completed until J976 by which time the 1974 
»puMul:-.vn( s to KSKA had aga i n ^changed tne FederaJ evaluat ion strategy. 



..Prior to 19 7A, legislative language did not reflect an .explicit 

i 

ln.t^nt to have an evaluation system that produced Information useful to 

local staff In Improving their programs— a fa>t which is consistent with 

the early preoccupation with Federal nv>eds. By the 197A reauthorization^ 

hc^wever^ evaluation had become a major 'concern in Congress. la fact, 

evaluation activities in general had mushroomed from 1968 to 1973 as seen 

by the increase from $1.2 to $20.1 million^ in US#E planning and evaluation 

fiiiuls (tIAO, 1977). The legislative history for this, period reflects the 

multiplicity oi purposes for the;Title I evaluation system: 
♦ 

The present law requires local school districts to ccJn- 
duct annual evaluations of their Title I programs and to report 
the results of these evaluations to the State educational agen- 
i'ies. The States in turn must submit periodic reports (include 
in^ the results of the local evaluations) to O.E. The purposes 
of these requirements are: to_ ena ble each local educational . 
* I c t o^ assess the effect of Its program and to Identify 
weakn es se^^ .as^ well as strengths of "the project, thu^ ser ving as * 
a tno^ tor progrcim revi sion an d improvetneat ; to enable each 
State to determine the extent to which progr.ess has been made • 
in reaching State goals for meeting the needs of educationally 
deprived children^ as well as to provide a tool fpr State plan- 
ning, management and dissemination; and to enable the Com- 
missiont^r i^f IMucation to conduct a similar analysis at the 
national level.. (Empliasis added.) (USCAN, 197A, p. 41.11) 

I [lis intent \^\ Congress, combined with the lack of nationally Compelling 
data dur in.c .reaut hor i^at ion when the program was under attack from the 
ailiniii i st rat ioi\, resulted in Section 151 of the^ 1974 amendments. 

StH I Ion I'^l went far l)eyond the preceding legislative requirements 
lor evaluation in its specificity regarding evaluation and the responsi- 
hilitios rSi>K in i*oneiucting them. It contains the following requlre- 
mont s . * ^ 

St'c . l')l.(a) The Commissioner shall provide for 
i iuK'[H*iKlent t*va Itiai ions wliich describe and measure the impact 
ot prv\r.rams and projects assisted under this title. Such 
oval nations ... shall include, wlienever possible, opinions 
obiaiiifd tr»^m progrfam or projiH-t participants about the 
striMi>;ths And weaknesses of such programs'^pr projects. 

vh) Tlu' Commissioner shall develop and publish standards 
1 t*va I oat ion o\' program or project effectiveness in achiev-^ 
iui; t !u* objectives At this title. 



I 



f ' Cofltaissioner shall, where approprlat'e. consult 
w.th State agencies In ordet to provWe for Jointly sponsored 
objective evaluation studies of program and projects assisted ' 
under this title vithin a State. f j *» 

(d) The Commissioner shall provide to Staufe educational 
agencies, models for evaluations of all programs conducted - 
under this title ... which shall Include uniform procedur-es 
and criteria to be utilized by local educational agencies, as 
woll as by the State agency in the evaluation of such programs: 



/ • 



(e) The Commissioner shall provide such technical and 
other assistance as may be necessary to State educational 
agencies to enable them to assist local educational agencies 
In the development and application of a systema.tic evaluatio.i 
of programs in accordance with, the modeV developed by the 
t!ommissioner . 

(f) the models d^vel^^jjed by the Commissioner shall 
specify pbjective criteria which shall be utilized in the 
evaluation of all programs and shall outline techniques (such 
as longitudinal studies of children involved in such programs)- 
and methodology (such as the use of tests which yield cbm- 
papable- results) for producing data which are icomparable W a . ^ 
statewide and nationwide basis. ^ 

I 

(g) The Commissioner sha-l make a report to the respec-. 
tive c(^mmittees of the Congress ... * 

(h) The 'Commissioner shall also develop a system for 
tho gathering and dissemination of results of evaluations and 
tor the Identification of exemplary programs and projects ... 

Tho tlnal part authorized funds for carrying out these provisions not to 
oxceed 0^:'; of the program's appropriation. • 

The Initial response of USOE to this mandate was twofold: first a 
oontract was awarded to RMC Reser rch Corporation which resulted .in the 
development of,' evaluatior models and second, a massive multiyear study, 
the Susta^lning Effects Study, was designed! The former, in fact, had 
been designed prior to the passage of the 1974 Amendments in anticipktion 
of the forthcoming legislative mandate. Its language was vague in" that 
it did not specify evaluation models, but rather asked for a "review and 
analysis of past reports and development of a model reporting system and 
format." The scope of work was expanded, however, and in 1975 produced 
three evaluation models. As described by RMC: 



1 .• 

< I 



"The three evaly^tion models are: iModel A, the Norm-Referenced 
•-^iModel; Model B, the Control Group Model; and Model C, the 
Special Regression Mpdel. Eaoh model has variations that en- 
able it to be used with either normed (Model Al, Model Bl, 
Model CI) or non -normed tests (Model A2, Model B2, Model C2). 

The norm-referencea evaluation design generates a no-treatment 
^ expectation from the assumption that the treatment group will 
miilntafn its status relative to the national norm group from 
pretest to post test without treatment, the control group 
moddl utilises the posttest (or adjusted posttest) scores of 
a control group as the no-treatment expec£at Jon, The special 
'regression design employs the mean posttest score predicted 
from a comparison group's regression line as the no-treatment 
•expectation • ' (RMC, 1976, p. 7) 

Meanwhile, the plans for the Sjustaining Eff3cts Study awarded to Systems 

neveli);^)mfnt Corporation came under attack rom Congressional staff. The ^ 

stiulv was undertaken for two purposes, according to USOE: . to report on 

tho numbers of economically and/or educationally deprived elementary 

st:h«u^I students who do and do not receive compensatory services; ^nd to 

report on the benefits students dferive from such services during more than 

one school year. The initial design was for a seven-year study at an 
* • • • 

estim^ited cost of approximately $25^million. .According to Congressional 
staff, there was concern that with .the bulk of the money agoing to a single 
uat iiMi.il study, there would be little left to wpgrade the evaluation 
rapahlliLies of st^ate and local districts as intended by the legislation. 
Atter ne>;ot iat ioni> between USOE and Congressional staff, the Sustaining 
Kt r\»,.'ts Study was reduced La scope and OE created Technical Assistance 
Centers {'\\\C) in each region, which began operating in October of 1976. 
'V\\v TACs provide free consulting services to SEAs, and through them to 
I.KAs, on aJ I aspects of Title* I evaluation but particulaily on the 
implemontcit ion ol the eval uat Ion ' mode Is . J 

9f In summary, the two primary studies currently under way are base'd 
on strate>;ios essentially the same as those in operation over the past' 
six voars: an independent national evaluation and the three-tiered report- 
ing; svstem. The difference is that cc^nsiderab le effort has-been devoted 
to Improving the methodology of both strategies. This is reflected in the 
KMi>'.i t lui in.il naturt* of the Sustaining, p:ffects Study and In the evaluation 



6 



16 



models proposed as part c f the three-tiered reporting system, combined 
with the technical assistance oa their Implementation. 

It is clear that national stucfles are hot' Intended to provide infor- 
mation for local use*, but rather to provide a national, picture 'for USOE 
•ind Congress. Similarly, the development of evaluation models to be used 
la the three-tiered reporting system was motivated primarily by Federal 
needs. As the original Request for Proposal (RFP) for thik development 
Slated: • 

This statement of work is intended to improve the data 
quality in LEA reports to states and State Title I Evaluation 
Reports submitted to USOE. In combination, these efforts 
should significantly improve the national data base upon ' 
which Titl^ I impact is evaluated annually. (RFP 74-39, 1974) ' 

Tlie RFP went on to note that, for the pi^pose of obtaining national im- 
pact data, use of the .on-going data collection efforts by LEAs and SEAs 
Is leas preferred than the use of data collected in. a national study 
expressly for that purpose. , 

' As this brief history demonstrates, Title I evaluation began with 
the Idea of locally collected data passing up through a three-tiered 
system to the national level. The .failure of this system to provide • 
nationally useful data led to the current system, designed to impose 
procedures on LEAs to ensure that their data are compatible with national . 
'needs.' Nevertheless, it is evident in the currents reauthorization pro- ' 
ceedlngs tha.t there is still a strong desire on the part of Congress for 
data that can also be used locally, The House version of the bill (yet 
tl^ go to conference) explicitly refers to local use in its evaluation 
requirements for -LEAs. to" wit : 

k 

A local education agency may receive funds under this title 
only If ... the evaluations address the purposes of the pro- 
grams ... and... the results of those evaluations will be 
utilized in planning for and improving projects and activities 
carried out under this title in subsequent years. (H.R. 15) 

In conclusion, the three-tiered reporting system is the only Federal 
strategy under way that carries the potential for providing locally use- 
ful data.: (;iven the emphasis in the development of the proposed system 

7 



on the national need for data that can be aggregated « It Is reasonable to 
ask whether the r>ew system will meet local needs b&ttei "ban its prede- 
cessors. Therefore, this study was designed expressly to investigate the 
extent to which the Title I ev.iluation system has been providing data 
^that 'are used by local Title I staff in planning for and improving their 
programs; and to anticipate the impact of the new evaluation models and 
Tochnioal Assistance Centers on the local utility of the data collected 
under this system* 

Destffl of the Study 

* The design of this study was influenced by some recent research on * 
tht? connections between evaluation findings and decision-making (for 
example, Weiss, 1977; Cohen and Caret/ 1975; Frankel, 1976)* This 
Influence was primarily one of limiting expectations, wl^ch in turn in- 
fluenoed the approach to data collection and the selection of districts 
to be vlslteu. Literature on evaluation has only recently inc/Luded 
att'empts to understand the role and use of evaluation results, particularly 
in the realm of program planning and decision-making* Partly , in response 
to the rbsence of compelling evidence of -evaluation utilization In 
dku- islon-mak ing, this research has tempered idealistic notions pf clear 
connections between evaluation results and decisions. It has* begun to 
su>;>:o-;t t'he bounds within which evaluation can reasonably be expected to 
provide usable information, and shows that t^he role of evaluation in 

is iiui-mak ing can be important even if , indirect- and elusive (for example, 
\\\ sfttin^; A climate of opinion or lending weight to common sense under- 
st.indiuuK about programs) - 



'Ur tlie basis of these findings and my own work in loVal school 
aist^ricts, I embarked on this study with highly restrained optimism about 
bein^ able to ideiWify uses of Title I evaluation results. I did not 
expect to lind verv many examples of use -of evaluations, either in 
judgments about programs or in decisions about changing programs. There- 
tore, I decided first that it was essential to intetcview Title I staff 
and parents in person, allowing enough. f lexib II ity to adapt questions to 
lespon^i^nts and their situation and to probe into each LEA's decision- 

8 



process. I also deeded consciously^ to s^ect districts reputed 
to have an ab^ve-sverage .^phasls on or concern with evaluation results. 



i^ampl 



e 



ro identify districts that emphasized Jvaluatloju-Ksked knowledg- 
able persons for recoWndatlons. Including directors of the Technical 
AssLstance Centers, USOE staff, other researchers, and -local staff with 
wlum I was already acquainted. For most of the sample, the final 
.selection was based on recommendations of the st..te Title 1 directors in 
states suggested to me. After explaining the purposes of the study, the 
state directors suggested several districts that met my criterion, from 
which I chose those to be visited. 

We visited 15 districts in 6 states: 3 in California, 3 in Washing- 
ton. 3 in West Virginia. 3 in Iowa, 2 in Nebraska, and I'm New Mexico. 
Ot the 15 districts, 10 were small to medium-sized cities ranging in 
population from approximately 75,000 to 500.000, with'a median of 

approximateiv 200.000, The remaining S r^^o^».^ ^ 

» iucremaining :> districts were rural to suburban 

^ with populations ranging from approximately 10,000 to 200,000. \ " 

I was able to augment the sample with an additional 15 districts 
through. the cooperation of the Huron Institute and USOE. The Huron 
Institute was concurrently collecting similar information from local 
aistrlcts in their USCE-funded study on the feasibility of developing 
^'valuation models for;Title I early childhood programs. Their sharing 
^'t t ladings in effect doubled the sample and expanded the range bv repre- 
sent iu.^ an additional- 6. states and including 4 cities with populations^ 
between 500,000 and 1 million. 

Both tl,e .selection procedures for the sample of districts and the ' 
si/e of the.sample clearly preclude statistically valid generalizations " 
to the nation. .IS -a whole. Choosing districts especially concerned with 
ovaliMtion, however, ensures that the findings are based on situations 
witl, the >'.reatest potential for use of evaluation. Therefore, these 
viistricts .should represent the high end of the continuum of evaluation 
use u. program Judgments .-i^jd decisions. Similarly, conclusions concerning 



factors Inhibiting evaluation use will apply even more to districts with 
less timphasis on evaluatt^ not represented in the sample 

The sample also Included four state Title I offices. For the purposes 
of this report, the findings from the state-level interviews serve as a 
background for the interpretation of findings from tne districts. Because 
of time and budget ccmstraints, however, the information gleaned from the 
states is irot reported here. I hope, to expand upon the data base and 
report ov state-level findings in the future. 

% 

Interviews ^ 

A district visit was made by one or two interviewers for one to two 
days. In each district, we interviev/ed the Title I director, other pro- 
ject-administrators, the Title I evaluator,. principals of Title I schools, : 
Title 1 teaching staff ^nd parents of .Title I 'students. In some districts^ 
non-Title I administrators, such as'the superintendent, were also inter- 
vicwfdl Title I director .^.and Title I evaluator are my terms for the 
pt»rsons responslbl'e for the administration and'evaluation of the program. 
Their actual titles varied from district t-jo dist rict as did the titles of 

other Title I administrators, 
•\ • 

In each ilistrict, the interviews were set by either the Title I 
directi^r or evaluator and were done either individually '^or %n small groups, 
itt»pend in^i upon sc heduling conv.enience. Generally, the interviews lasted 
•from mie-liali* hour to one hour and occurred either in the centra>^ office 
or at the schoo I 's i tes , >Iho interviews were structured, to the extent 
that the same topics were pursued in each interview, but t'^e emphhsis on 
ea.-h u>pi^* -^i^^l speclTic questions were tailored e^i«h situation and 

tt'spondent. The categories of topics included: ' ^ ^ > ^ 

i'haracier ist Ics of the Title I program 
11. *w pro>^riim decisions are made 

illiararter istUcs of the local Title 1 evaluation 
Knowliulge ot the local Title I evaluation 
Tses oi evaluation results in judginj^ programs 

and in program planning 
Knowled>;e of and reactions to the eVriluation models 

and I'ACs, 



10 



We chose n6t to tape record the interviews in order to maximize candor on 
the part of the respondents. However, we did take extensive notes, in- 
cluding as many verbatim quotations as possible. 

In addition, in each district we obtained copies of their evaluation 
reports and other related documents. 

Anajysis ' 

The task of synthesizing approximately 1,000 pages of typed, field 
notes is awesome. .The approach consisted essentially of reading the 
n.to.. several times and tentatively drawing a set of generalizations from 
them. For each generalization, the notes were gone through carefully, 
oxtraeting evidence In support of and opposed to the generalization. 
After this stage of refining the general statemen'ts to be reported, state-: 
raents illustrating each point were extracted fpom the notes. From these 
staLemt'nts. examples were selected for inclusion in the report. Thus each 
quotation reported in the text represents a much larger set of quotations 
Illustrating the same point. This procedure was "followed to ensure that 
the quotations were indeed representative of *the districts. Since the 
iuterviews were not taped., the quotations are not demonstrably verbatim. 
Thev do, howeve^, reflect the words of the respondents as closely as 
possible and capture the flavor of the response. For this reason, the 
vaAt nujoritv ot the quotations are based on the 15 districts we visited 
personal I v; md those from the Huron Institute sample are used sparingly. 

I 

Or^aii i /at ion of Report 

Ilie bodv of the report in i omposad of three sections; primary uses 
evaluation, uses o{ evaluation in judging programs, and uses ot 
evaluation for })rogram decisions. These i^ect ions are intended to bo, 
priiiun ilv descriptive; lu^wever, the urge to interpre«t has not been com- 
pletelv controlled. The final section contains Interpretations and con- 
clusions Jrawn from the findings and their implications for the future of 
I 1 1 I e I eva luat ion. 



1 I 



II PRIMARY USES OF EVALUATION 

r found that the main part of the district Title I evaluation .report 
tor all the LEAs visited consists of posttest or gain scores reported 
tor each pro,1ect on standardized achievement tesu. A few evaluations ' 
included additional information, such as the results of questionnaires 
given to staff and parents soliciting their opinions of the project. On 
the whole, however, program evaluation is synonymous with standardized 
achievement test scores. Accordingly, the findings presented throughout 
often Indicate uses of and attitudes toward standardized achievement 
tests rather than the evaluation report per se. 

This section presents the r-esponses to the general question: How 
is the Title I evaluation used in your district? The most frequent 
responses fall into. three areas : to meet requirements, as feedback to 

school staff and parents, and as a rough index of the program's impact 

on achievement* 

There is little doubt that the primary function the evaluations 
serv.> is to meet rbe state and Federal reporting requirements of Title I 
Districts employ standardized tests because they are the simplest way of 
meeting the Federal mandate as Interpreted by their state. LEAs are 
totallv accustomed to the fact that receiving Federal money has a number 
oi strings attached to it, of which the evaluation requirement is merely 
one. For example. 

This district will accept all strings that go with the Federal * 
money. Riche^ ones might not but we neec' the money. 

(Director)* 

Throughout the text the type of respondent is identified in parentheses. 
All directors, administrators, evaluators, teachers, and parents are 
par of Title I. Principals are all in Title I schools and non-Title I 
aumt.nistrators are all superintendents. 



Therefore, with the exception pf staff concerns with the time devoted to 
testing and the reporting burden^ evaluation is usually perceived as jnst 
one of the nuiny hoops to go through in order to receive the funds* 

We go along with externally imposed regulations as long as they 
do not impose an overwhelming burden. When they are burdensome, 
we will exercise our own judgment about what is legitimate and 
not go down without a fight* (Evaluator) 

Evaluation is not a burden; it is an unnecessary but required 
evil. It does little harm but is of no particular use. 

(Teacher) 

So long as the burden is not undue and some local autonomy is preserved 
In designing their program, most local staff responsible for conducting 
the evaluation are concerned primarily with meeting the legal require- 
ments: 

Testing Is an economical and straightforward way of complying 
with the regulations; w^ send the data in ^d then go about 
our business. We're not going to lose any sleep over whether 
or not the results show effectiveness. (Evaluator) 

Providing data to meet evaluation requirements is an accepted fact 
of life. Title I staff also believe that the Federal government has a 
right to request the data because they are footing the bill. Moreover, 

4 

^ maiiv but not all Title I staff think that there is a real need for the 

dal.i at higher levels (i.e., district, state, or Federal). One district 
director described the perceptions of his .staff in the following way: 

TtMchers feel that all this data collection goes on because 
t.he state needs It or more generally the government needs it 
anil t hi»v are sympathetic with their need for knowing what 
happens with their money. But outside of this necessity, 
thev ,-vee little purp<\se. (Director) 

Similarlv, in another district. 

There is .i real need for the big picture at the state and 
national levels* • (Director) 

In tlie CiMUext of the new USOE evaluation models, the district evaluator 
H.iiil: 

r. 

1 iMU see the Fculeral and Htate need to demonstrate bang for 
t lu' biu k hilt cannot see why they avoid educators in coming 
up with guidelines. (Evaluator) 

14 



ERIC 



some other respondents, however, were less sanguine about the appropriate 
ness of aggregating these data for national purposes. For example, 

' lheTtLl^^\'^^ three-tiered scheme will give the feds wh^t 
they want A national picture Is not appropriate. You have 

wasrour^hfd.^ differences and the -accommodations 

.wash out the differenced. (Evaluktor) 

^yy}---3S}^.oS^^^nlt3 _to Staff and Parents ' • " 

The second primary use of evaluation results is to provide feedback. 
Feedback in this context connotes simply communicating evaluation results 
to program staff and parents. Theoretically, this is the area that pro- 
vides the greatest potential for use of evaluation in making judgments and 
decisions about programs leading to improvements. As" one district 
administrator stated: 

If thetest data are not useful" to the-principals. they 
aren't useful at all, / (Administrator) 

All districts provide some type of feedback, but the type of information' 
fed back varies enormously. At a minimum, feedback consists of sending 
the evaluation report to the Parent Advisory Council (PAC; and the 
principals of Title I schools. This situation is the one least likely 
to lead to any utilization (or even understanding) c the information. 
Principals rarely look at the report under these conditions, and teachers 
often do not see it. Most districts, however, provide school by school 
results, and sometimes class level results, which are transmitted to the 
appropriate individuals. 

Sometimes this information is quite comprehensive. For example, in 
one district each Title I school receives a .15 page mimeographed document 
containing graphs of the relationship between school level poverty indices 
and achievement (with the particular school's code circled), detailed test 
s. ore results for the school (by subtest and skill area), with national 
percentiles, and a comparison with the prevloqs year's data for that school 
It also contains other descriptive data on school and community charac- 
teristics such as mobility, enrollment, and income. The introduction to. 
the report reads: 



ERIC 



15 



Vhe purpose of this report is to share information about students 

• in Title I schools in ^ ^. It is intended that the report be 

seen npt a.4 an evaluation report but as a collection of infor- 
mation that will help administrators^ teachers and t>arent8 plan 
even stronger programs for the children in these schools. 

« Much of the information reported here \fas collected as part of 

the data base usedUo evaluate Title I programs. In 

iiddttton^ the ' Research 5anxi Testing division con-- 

trlbuted data It has gathered throiigh the. state .mandated test- 
ing program, ^ • . * 

Tilts district was extraordinary in the efforts put forth by\ the evaluation 
staff 'to make evaluation p^rt of the program planning effort. They go to 
considerable effort to presenf the information for each schopl clearly^ 
and' to explain the. findings in person to teachers^ parents and the 
principal., tn this district^ ^as well as others^ it was stated that feed- 
back thac^N^li'd^d personal explanations by evaluation staff was much more 
f^ltkely to be understood and make an impression on the school staff; and 
hence n4ve the potential to be utilized. In another district the director 
sa Ld : I * . 

Prlnclj>als won't make any us#.of evaluab^ion rejijj^ if you just 
send data — you need to go talk- with them about it. (Director) 

In sun>Os^he provision of* feedback » particularly when explained ,in 
porson, provides what may be a necessary but not sufficient condition for 
ntili/ation of the evaluation information. • ; • . 

ur.iss l.iuU-x of Program Eff ectiven ess 

ALti\o:rji the primary local uses of evaluation are tb meet requirements 
aivJ tv> :M.^.:ae teedback, .other uses are not preclucjed. The third major 
use tails luuler the category of gross or rough indications of program 
e I I t iVeness, This rntegorv .d if f ers from the previous ones in that it 
ocrurs .il the inflivldual rather than the system level. The use of evalu- 
at iiM\ IS a i',ri\ss index of program accomplishments takes several forms, 
t\w most common of which Is use of evaluation to confirm one*s existing 
ht^Iiet's about a program.' For example: 

The main t^^^i'P^^-'^t? the test scores serve is to support your own 
vii^ws. (Teacher) 

•16 



I look at teat scores mainly to confirm my own impression. 
It th^y differ, my impression counts. "(Teacher) 

A related use of evaluations under this category is that of giving 
a rough Index of program success, but not as a guide to action. For 
example, 

% 

Standardized achievement tests provide an 'indicator of where 
• children are. . (Teacher) . 

re«ts can only be interpreted as a rough guide. (Principal) 

Also related to this category is the use of evaluation as a public re- 
lations document. . • 

I want; information to justify expansion of the program. t 'm * 
not interested in. information showing students are behind 
national norms. (Superintendent) 

Illustrating another-form of public relations, i-h'one district the 
evaluator explicitly pointed out the need to use the evaluation report 
as a way ot- educating the district administration and board to have 
realistic expectations about the effectiveness of theif Title I program. 
Siiailariy, in another district the reading program director described 
thf situation In which a school wanted to withdraw from participation in 
Title I: 

They claimed that Title I was associated with a decline In 
test scores. We were able to pull out the evaluation report 
and demonstrate that this was not true. (Administrator) 

In another district, the superintendent stated: 

f 

Day-to-day problems don't show up in the evaluation. Sub- 
jective feedback is often more useful in daily operation of 
the program. Tlie other stuff is .what you impress people 

(Superintendent) 

These comnuMi uses of eva luat ion— as a source 'of confirmation of 
o.<i sting beliefs, as an Indicator of success, and as a public relations 
document— share an Important characteristic: they are triggered only by 
positive results. Thus, the evaluation report as an end in itself (apart 
frum meeting requirements) is seen as useful only when the results are 
positive. W|,en the results are negative, the evaluation is discounted 

17 



for any number of reasons* (Elabojratlon of these reasons Is contained In 
the fo^^lowlng section'*) Thus, it is often the case that negative find- 
ings, leather th^n being taken As informative about the program, are 
viewed as ari annoyance that must be explained away* As two examples. 

None of' the subjective stuff is included in the' evaluation 
^'^xcept" oct^isionally to explain low results, (Superintendent) 

• ♦ 
One year the scores for second grade were low* We looked for 
the reason by talking with the teachers to sec if the skills 
tested matched*the curriculum and if the students' scores 
matched the teachers' judgment* From this, we concluded that 
the test was invalid, (Evaluator) 

In anotheV district, the district staff were all extremely upset over 
results that showed negative NCE growth* ^ • ' 

* 

We are having the TAG reanalyze our data looking for floor - 
effects* We know Instructional growth^s taking place. 
The negative results hurt us in severa^ways* First, Con- 
gress is always talking about the possibility of tying funds 
to gains and, second, we get a bad reputation. Poor results 
limit our ability to share information about the program and * 

lead to low morale* (Director) 
J 

/\nd in another district, one school had very low scores: 

Wc discovered that there had been an Influx of Vietnamese 
students into the school. In another school with low scores 
we t(>und that there were a number of students who were near 
KMR (oducablc mentally retarded). (Director) 

Howt'vri » not all evaluations with negative findings are ignored, 
i'l ro art* .i U'w instances in which they are taken as a gross indicator 
M a wt\4kiu»ss, but this occurs primarily in the context of needs assess- 
nuMit . Although the same set of standardized scores, or at least the same 
tvpi», art* used for both needs assessment and evaluation, they are far more, 
liki'lv be st*en as useful and acted on when they are viewed as needs 
assessmtMU dtita ds opposed to evaluative data. This point is expanded 
upen mi^re under the suction on what in format lo^i. people claim they use. 



18 
9 



Ironically, there is alfflost. unanimous agreement that standardizid 
tests (especially vhen combined with teacher Judgment)* form a good Lis 
for selecting students for the programs. Although this use does not 
relate specifically to evaluatipn. it is mentioned here because it is a 
widel. approved "good" use of standardized tests in a world in wh^ich they 
are usually criticized severely. .. • . 

In summry. the primary local uses of Title I evaliilt ions are to meet 
legal requirements, to provide feedback, and to provide gross indicators 
of program effectiveness. Title I evaluations do not seem to serve, as 
primary purposes, either as a basis on which to judge the program or as 
a guide to program improvement. Since direct inquiries about uses of 
evaluation results did not, reveal use in program planning and improvement, 
we pursued the issue in more depth by asking respondents how they judge 
the programs and how decisions about programs are made. The findings 
from these inquiries are reported in the next two sections. . 



rw>^ districts wvrc .-xceptions. One felt strongly that teacher judgment 
should not bo included and another based selection exclusively on 
tit-r judgmont . 



19 



I^I USE OF EVALUATION IN JUDGING PROGRAMS 

* 

This section discusses evaluation in the context of Judging the 
effectiveness of programs. Everyone involved in a Title I. program makes 
Judgments about its effectiveness. These. Judgments can be ati end in 
themselves or they can be the basis for deciding how to change the pro- 
gram in order to improve it. Evaluation is viewed somewhat differently 
from these two perspectives;, therefore. I treat these two perspectives 
separately. Section IV presents the findings on evaluation in the con- 
text $t program planning and redesign. * 

Before discussing the ways in which evaluation results do affect 
program' judgments, it is useful to' cons id et the reasons that limit their' 
utility. 

A 

LiM-ts^o f Evaluation in Judging Pro'yams . • * 

From asking respondents how they would demqnstrate that their ^pro- 
grams were successful and how they would make Judgments about other 
programs, it is possible to deduce why evaluation play? such a limited 
role in these judgments. There ate three classes of .reasons limiting 
the impact of evaluation on judgments of program success: the data 
they provide are not considered as persuasive as other sources of in- 
formation; the analyses ignore important mediating variables; and the 
evaluations don't measure important goals. 

Data^ jJo t Cons ide red Pe rs u as Ive 

When local staff weigh standardized test Results against other 
sources of information in judging the success of their program, the 
other sources of information almost always ckrry more weight. Conflict- " 
ing information from standardized tests and sources such as criterion 
or other skills related tests and personal judgment (gleaned ftom 



21 



dbsiirvatlon^ Intuition or some combination) are inevitably resolved in 
favor of the other sources. Some examples follow: 

At the School and teacher level, more attention is given Jto 
criterion referenced and. diagnostic tests. If a tsache^ 

• sees Inconsistency between the CTBS results and results on 
these' other instruments, she will believe the latter. 

(Evaluator) 

Individual diagnostic tools provide the b^sis for my judg- 
ment of program success; not the standardized tests. . 

(Principal) 

The CTBS tel,l8 us by grade where the school is the weakest. 

also use our curriculum tests. The results don't always 
match, then we go with the curriculum tests because they 
are more immediate and frequent. (Teacher) 

I would trust my own opinion over\ test score. (Teacher) 

In general, as one director put it: 

People use evaluation to support their beliefs but will not • 
change their beliefs on diUconf ironing evaluation evidence* 

(Director) 

Two findings connect this last point to the fact that negative standard- 
ized test results are usually ignored. First, school staff (and generally 
all Title 1 staff and parents) are happy with their programs. Second, 
evaluation results are looked at primarily with an ieye toward confirming 
bfllefs (see Section II). Together, then, positive results serve to, 
ifintorte existing positive feelings toward the program but negative 
■ffsuUrt are Ignored and, if necessary, explained away as inappropriate. 
Whon the results are negative, it does not seem to be the case tli. t staff 
alroadv kiu'W there was a problem; in fact, the case is usually that the 
probU'm pLr«.eived is not with the program but with the tests. 

Ihf abovf examples illustrate the ease with which tests are written 
.ott wh.Mi tt>sL results are incompat-^ble with existing beliefs about pro- 
v.ram t- 1 1 1-^- 1 ivoiioss . The widely publicized methodological critiques of 
standardUod tosts facili-tate thin process In that people who are dis- 
pUM.sod with tost results can quickly call to mind "scientific" reasons 
for rt- ifct iiiv; the tests. As one administrator stated: 

It" the standardized test scores are negative, it's, okay be- 
. lause evoryone buys the argument that they can be discredited. 

(Administrator) 

22 

3.' 



« 



Ana if the prol^lem isn't with the teats. It is with the testing conditions: 

If my Judgment and the test scores tell dlffer,ent stories. I ' ' 
believe my judgment and look for explanations such as problems 
m giving the test, or how the child felt. • . .(Principal) 

Or thore is a problem with the analysis, as described below. ' 

L"l£!jrt ant Variables Omitted ^ . 

A^requent explanation for ignoring staridardiged .te'st results In 
judging programs is that the scores are not meaningful because important 
background characteristics of scfiools or children have notr been consld-* 
ered. Kxplanations of this type usually arise in the 'context of negative 
or low test results and potential comparisons with ot^fer schools or pio- , 
.grams. For example: t 

' t 

Kva Illations must take into account the amount of time 'devoted 
to instruction. You can't compare programs with different 
amounts of Instructional time or with different goals. 

(Teacher) * 

The t?valuation should have more information on the charac- ' 
torlstics of the kids because there can be big differences 
betwtftfn schools in socio-econpmic status and. mobility and 
other things you can't measure readily. (Teacher) 

Kach school In the district has different ob.lectives. So a 
>iood school may be ranked lowest because if has. harder ob- 

■ (Parent) 

The school's drop In ranking can be explained by several 
ta. tors not iiavlng to do with the program. You need to taKe' 
Into account the students' IQ, the number of students per 
staff, and the ^amount of Instructional time per student. 
. And some schools exclude students with low IQs when it comes 
to testing while others include them. (Principal) 

I would like to see more sophisticated efforts to adjust 
sttulents' expected levels of achievement for a variety of 
factors: attendance levels, number of schools the student 
h.ts attendeds number of programs he has been Involved in, 
ir he uses a second language, has a learning disability or 
it lie conies from a broken home. ^ . (Principal) 

There is great difficulty In using the same tests even if 
restricted to programs with the same goals because of differ- 
ences (n populations. ' For example, the bottom kids in tliis 
state are not as low as the bottom kids in New Jersey. 

^ (Evaluator) 

23 



Im portant Goals Not Measured 

Finally^ in Judging tH^lr program^s effectiveness, staff aftd parents 
looic ta information that assesses what they believe to be the xtiost im^ ^ 
portaot goals of the program^ usually in addition to, but occasionally 
'instead of» achiever. en t^ v 

4 * 

We would like to see all' kinds of alternative goals given 
equal place; parent involvement » student Self-concept, 
attendance .rates, library records and student enthusiasm, 

• (Administrator) 

*♦ 

Test scores on the CTBS don't say very much about wh'ether the 
program was successful. Test scores are less important than 
Rrojrfth in the affective areas. (Teacher) 

Evaluation data do not show what is effective. 'Teacher-pupil 
relationships and the quality of the teacher are what makes 
• the biggest difference. , (Director) 

A related concern vis-a-vis goals is that emphasis of achievement' 
tests has narro4fed , the focus of, Title I. 

Title I was fix*st a poverty program; now it is entirely 
achievement — all activities are now irtstructional , as 4 
result. In part, of using standardized tests; also he- 
cause achievement tests are used as allocators at the 
school level. (EvaluatoV) 

I would Like to do more' than. reading and math but you can't 
measure them so the state won't allow it. (Principal) , 

Wt> .iro suspicious of all hard data and see Title I^shifting to 
ri't Uct an obsession with testable outcomes. (Administrator) 

• 

Why tlie v'oncentration on math? Because 2 + 2 = 4. You can 
make cLocir assessments of what students know and this is much 
harder to do in reading. (Principal) 

In siimmary, there are multiple reasons for the minimal use of eval- 
uations in jmi^ing program effectiveness. Generally, the reasons that 
are stated reflect preferences for measures of achievement other than 
standardized tosts^ a fear of misleading comparisons^ and the view that 
programs have multiple goals. How then do local staff and parents reach 
conclusions about the effectiveness of their program? This topic is 
discussed below. 

2A 




What Information la Uaed or Desired In Judging Pro ^rim«? 

It ..Is Impossible to isolate preplaely the Information on wb<ch.ln- 
^ aivlduaU actually base their judgments of program' ejffectlveness. Psy- 
chological theories (e.g., cognitive dissonance) suggest that there are 
many Important variables to consider besides the availability of cer-. 
tain types of information. Nevertheless, since one iputpose of this 
study Is to provide a starting point for considering^ how evaluations 
might be macfe more useful, It 1? helplful to report tL types of Infor- ' 
mation that respondents cite when asked about program ..effectiveness. I 
consider bdth what Information respondents claim they ^se and what other 
types of infontiation they say they would like. ' \ 

\ 

The findings reported below are grouped Into three major categories: 
information related to cognitive growth, growth In noncognltlve areas, * 
and outcomes in areas not related to the child. The responses described 
under these headings were elicited primarily by asking. quest Ions such 
as: How would you convince me yoOi^r program is a success? If you were 
choosing a new program, what would you consider? 

C ogn i t*Lve Growth 

Most respondents are concerned with growth In cognitive areas, 
usually reading and math. Thus the tendency not to cite evaluation data 
.as a source for program Judgments is^more a reflection of the perceived 
I imitations of standardized tests than of the domain being assessed. 
For example, ' • ' 

s 

standardized achievement tests provide an Indicator ot where 
childun are but they' do not provide very specific informa- 
tion about skill attainment. (Teacher) 

.RespvmdontH, particularly teachers, are more likely to cite specific 
measures ot skills as better Indicators ot growth than standardized 
achlfvement tests—but just aS frequently they cite their own observations 
.uid experiences. He;^nce, 



l( 1 wert? to judge a program I would first look at the written 
>^oals of the program and then at the 'specific goals for each 
child. I would want to see pre and post test scores on indi- 
vidual skills rather than standardized achievement tests. 

(Teacher) 

25 . 



In Judging the/ effectiveness of the program I look at scores 
on the CTBS anil the Nelson' and 1 rely ort my own observations. 
You can just tjell If a child is improving. (Teacher) 

I Judge the pijografc on the basis of ray own experience. And 
I would back (his \ip with the opinions of teachers and par- 
ents when the! students get to the higher grades. (Principal) 

j 

Staff also prefer to base their judgments on relative rather than abso- 
lute (external) stLndards; that is» they want to assess progress indi- 
vidually as comparjed to where the child started; 

Vrtaat matters \is how far students have- come — not whether 
they're at gr^de level. ' (Teacher) 

I would judge students* gain by where they started and amount 
of instructionlthey received. (Administrator) 

\ 

Paronts, understandably, rely primarily on observation of their own 
child. For example: i ' 

I know whether my child ca,n read by observing him. 1 have 
seen increases in the nrmber of books he -brings home, and 
the amount of time he spends reading and this is -evidence 
to me that the program is helping my child. (Parent) 

Additionally, staff frequently expressed an interest in basing judg- 
ments on ,the long-term iriipact pf the program — information rarely con- 
tained in evaluation reports. For example: 

I would like to know how the students do in ninth grade as 
iudi'.ed by their teachers. Are the gains sticking? Will they 
i;raau.ite; Arc they interested in school? (Principal) 

I wv>uia like to see a longitudinal study over" 12 years based 
on avliievement scores. Also to know where students arrive, 
what their outlook is on society and on themselves. 

(Principal) 

As some ot the above quotations indifate, staff and parents are also in- 
it-restevl ill nom ogni t ive child outcc^mes. 



NoUL Oi^n i t ive 5l}.|.tyon»5iS 

In most L-ases, staff and pareiits are interested in both cognitive 
and notuoiuiit ive outcomes; hence, thi>ir i-omments concerning program judg- 
ments cannot alw.ays be clearly sorted between the two categories. In 

26 



addition, tt Is generally agreed t^at there are few. If any. good non- 
cognltlve muasurea.. Usually staff and parents cite their own observations 
or those of others. SOme ekamples: 

To convince someone the program was good I would use the Read- 
ing Inventory Test of Skills, even though it is not normed. 
Also, observation of students' motivation to see changes in 
personality and attitude. (Administrator) 

To see if the program was effective. J would look at four 

things: how well the student was doing in other classes, ' 

especially in areas which first caused them to x:ome to the 

residing lab, pre and post scores on the ICRT, teacher re- • / 

ports, and the students' attitudes to the program. (Teacher) ^ 

< 

I make my. judgments by looking at the children. Can the child 
perform? Is he at ease? Does he have a good self image? ^ 

(Teacher) 

The program is successful if students get attached to their 
teacher, if they want to go to the program. You also know ' 
something special is going on if students not in the program 
want to join it. Parents get a sense of the program and 
communicate it to their children too. (Principal) 

The program is effective if children know what they are do- 

^ (Teacher) 

(Jenerally, then school staff express Interest in areas such as student 
attitude and self-concept, although formal measures are rarely cited as 
Information sources for these areas. 



Areas Not Relat ed to the ChjLld 

In addition to judging program effectiveness on the basis of intor- 
i.ition about the participants, either cognitive or noncognitlve, some 
St. iff' expressed int-trest in the effect of the program on groups other 
than children. As examples: 



m 



There are lotrt of ways of telling if the program is effective, 
lest scores are one. Others are the working relationships, 
the .itmosphere and community attitudes toward the program — 
perhaps the last Is most important. The community lets you 
know if anything is wrong. (Principal) 

Yes, the program is a success because the parents and the 
kids think It is helpful and the teachers are enthusiastic. 

(Administrator) 

27 



After initial resistance, the staff has become supportive and 
you can tell it (the. program] is a success when teacix^rs 'say. 
good things about the kids, ; (Principal) 

OveralU although most staff are cpncerned primarily with the pro- 
gram's Impact on children^ there is interest in knowing the Impact of 
the program on the community, parents, and staff itself,, As with non- 
cognttlve outcomes, however, little mention was made of formal ways of 
measuring these areas of interest. 




28 

5 a 



IV USE OF EVALUATION FOR PROGRAM IMPROVEMENT* 

•- * t * 

continuing the search for instances of evaluation use, I turn from 
the question of how people judge a program's effectiveness to the ques- 
tionof how program improvement occurs. Specifically, I investigated 
how decisions about program changes 4re. made and the extent to which 
evaluation data are mentioned in this context. As with the discussion 
of uses of evaluation for judging programs, this section considers first, 
the limits of evaluation for program improvement and second, the types 
of information that are used in program improvement. 

\ 

Ij: s. of Evaluation for Program Imp rovempn^ 

A detL^rmination of the utility of evaluation results in program 
improvement must recognize that local districts have several levels of 
people involved in Title I, each with different information needs ^nd 
decision-making authority. Administrators are concerned with the oro- 
Kram as a whole; principals are concerned with their schools; teachers 
with their classes and parents with their children. Although their in- 
formation needs are not necessarily mutually exclusive, they often differ 
subs tant lal ly. 

Reasons for lack of use of evaluation for program improvement can *" 
be roaghlv categorized in three groups: programs are quite stable, 
evaluations are irrelevant; and evaluations are inappropriate. 

In considering the use of evaluation (and other infurr-.ation) as a 
basis tor making decisiops about program changes, I found that Title I 
programs are, by and large, remarkably stable. 



* 

These findings are limited to the 15 districts visited by the staff 
The Huron findings are not included because this topic was not pursued 
m their study in sufficient detail for purposes of this section 



At least 5 of the 15 districts stated this clearly,. As examples: 

There really isn't much progf^m planning going oa any longer, 
more a matter of continuing to operate the way they have 
be^fen going, . « ^ (Director)^ 

Maji^r changes in the program are Aevey made^ (Teacher) 

There have been no basic changes in Title I, The goals and 
methods are largely unchanged, (Director) 

Fr^in the small number of examples cited when respondents were asked about 
program changes^ it is clear that they are limited in all districts, 
Th^refore^ it should bq kept in mind that the universe in which to .find 
connections between program changes and evaluation is quite restricted 



Irrelev ance of Evaluations * 

" ' " * " ' ' «k 

The finding that evaluations are considered irrelevant to program 

vleclsions is fa part an inference based on st^ff comments concerning the 

overriding importance of other factors (e,g, » administrative, budgetary, 

poLltlcaU, These comments are discussed later.,* Other indications 

that Livaluations are viewed as irrelevant include distrust ot evaluation 

♦ * 

in general and the practical constraint of timing. For example: 

I doubt that tenting provides the kind of information on 
whul\ to base decisions. Title I was designed to let 
looals detune needs. Local * philosophies and priorities 
sb.ould stiape the program, (Director) 

l!ni>lvin>: that the whole ncUlon of evaluatkon is irrelevant, aauthor 
iHroi-iof saiJ: * "7 

} 

fK^w v in vou .ovaliiate when kids are starting at dl-fferent " ' 

l^lart's and developing at different rates? Means don * t 
mi'an anvt1\ing, ' , (Director) 

tiiKiIlv, it' the fvaluat Ion results .irr not available when doo isicnis are 
auk", t!u-v are irrelevant. In all districts tiiere is a del;iy between 
data vo 1 1 os-t t(Mi and repoftin^; of results. Usually the evaluation is 
h.is.'d iMi a spring test admin istr.it Ion and the results are not fep^'J"t<-'tl 
until the t'oIlowinR fall. This means, first, that pro^.ram planning for 
(h'e nrxt vear has already oci-urred — often during the spring even prior- 

f ' 

30 

•4 



to th«i administration of the posttest. .Second, at the teacher level, 
the students who were evaluated are no longer with the^same teacher. 
Although theoretic-ally data one yfear out of date are not^totally useless, 
sonje staff suggested ^at^ thls'timlng did preclude utiliza'tion. As ' ' 
examijles: » * . ' 

Kvaluation reports cannot be Included in .planning because plans 
must be submitted to the state before evaluation is available, 
nanni^ij- must be done at the busiest time of r^e year. 
. ♦ . (Evaluator) 

Data from the previous spring are too late to be of use, except 
U) purchase materials. • (Administrator) 

^ .'.V 'Pi'T'''Rt-i.aten'ess of Ev aluations 

Most staff interviewed did not speak directly to the issue of ap- 
• propriateness of evaluation for making program decisions. The most ob- 
vious explanation for this is that st^ff do not view evaluation as a 
possible «uide for program improvement. Instead, "evaluation" in all 
<-oiUexts is interpreted as a means for someone else to Judge the effec- 
tiveness ot t;he program. Thus, "evaluation" tends to be associated with 
• u oi.untabi 1 ity rather than with information for identif)ang strengths 
and weaknesses of the program. Ironically, when test scores are referred 
to .IS "needs assessment," the reaction to them can be quite different. 

The onlv. way we were able to determine any connections between 
ev.iluation .uul- program' dec isions was to work backwards from program 
derisions previously made. We asked what changes had hcen made in the 
pru.^ram and then asked why the noted clianges , ' If. any, were made. From 
til is we were able to determine the extent to which andardized test 
rc-suUs and other types of information played a role in the decision to 
make rhan>',es in the program. 

Altliou.'.h this approach generatc-d examples of information used in 
making rhaiu'.es (below), it did not produce- spontaneous statements about 
whv evaluations were deemed inappropriate. Therefore, my conclusion 
that c-v.iluat ions are usually considc^red inappropriate for program deci- 
si>vys is based on inlerence rather than direct statements. All staff 



31 



nnd pareats make judgments about the program, as we have noted* Because 

Judgments pn)vlde.the starting point for actions or decisions^ I iftfer 

« 

that staff would n-lalm that evaluations are Inappropriate for decisions 

fur the same reasons that they gtve frtr their Inappropriateness for pro-* 

v'.ram judgments — that Is, because standardized tests are not convincing 

measured ot achievement and people are concerned with outcomes not ad- 

♦ 

dressi»d bv evaluations. , * . 

-* 

.ryPi*.^. ^^f Inf orm^itio n a re Used In Making Decisions? 

Bei'rtuse of the limited number of program chafes, and hence the 

limited evidtMU'e euneerning their causes, this discussion is constructed 

siMiiewhat differt»ntly from the preceding ones. From the field notes for 

i\U'\\ of I > districts, I extracted every example of a connection be-- 

twi'en a program dei lslon and some kind of information (defined br;oadly) • 

These i'X.unjUes were elicited primarily in indirect fashion, through 

iiu]ulrii»s first on how ilxi^ program had changed recently ^nd then on why 

the I lian)^is were Initiated. The examples should be Interpreted in the 

tlu- 1 ii:ht of liow tliev were collected; to wit, «we took al 1 ^responses at 

t.ifi- v.ilui'. Wt» did not attempt to trace program decisions to a primary 

Si>urt e nor ti> reSi-^lve conflicting expl'^ations from different respond- 

tMits ill till- s^ime district. Because no program change stems from a single 

lurit', and pe rcept IiMis ol* cMises often differ, such a task would have 

i-n i rj|h»s^; i !> 1 1' , For exampli», In oni' district parents were convinced 
« 

i'uii (lirv h.ni bfen responsible for the introduction of a math component; 
ulnii/M .trai Ts, ox\ the other hand, felt that the- program had been initi- 
al t^! !u i- ri ^e t lu'v perceived the net»d and funds were available. 

I i lifDt i 1 ii»(l in total approximat e 1 v illustrations of eonnt'ctions 
!>ftWi'rn pri'j'.rani i'lMn>',t»s and i nf orma t ion • ■ The tvpes of Information cited 
in t luv.c illustrations can gi*(uipecl r(uiKlily into four categories: 
cv I J u.it i . Ml ; t i -;ca 1 y'p(> 1 1 1 i ca 1 ; '\sni) j ec t i vc" ; and "objective** with the 
uii.ic rs I .nui i n>; that si'veia I illustrations fall under more than one cate- 
.^M \ . Fxani[)l»'S from isich catf>',(»rv are presented belcw. 

ui t i^nl^-<^na r t e r of the i i 1 us t ra lions c- i t ed eva 1 uat ion o r t es t 
'i.^Mt*. as . t>n t r i biit i n>; ti> a clianui* in the program. Twc^ examples are: 

' 32 




From the survey Information in the evaluation, I. saw that 
some teachers in the school weren't as well informed 
about the Title I program as they should be so I made it 
• • a point to work with them more. ^Teacher) 

I c ire It! the high and low postteat scores and meet with 
the teachers on weaknesses to consider for next year's 

* • ' (Administrator) 

• * 

Several examples in this category suggested less than a compelling 
coanectioa between the test scores. and the change initiated (or the 
change was described in such vague terras that r.he connection was dif- 
ficult to determine). Some examples are: 

I took heed to the low scores in comprehension and did 
some ins|brvice. : " (Principal) 

Test results showed that students did poorly in drawing 
inferences. The scho^il responded by beefiijg up materials 
in this area. (Teacher) 

I look at the class results to see if anything is out of 
pliase. I found some had dropped in math and diagnosed 
the problem as three different approaches being used 
school -wide. So I picked the one most widely used and 
stopped the rest. • ' (Principal) * 

Wo stopped serving three-year olds because they scored 
too high at the end of the year to be eligible as four- 
vt-ar olds. ' (Director) 

Those examples suggest that changes associated with test scores tend to 
be minor (excepting the last), and that the vagueness of the changes 
porhaps ret lects the state of the art in the held of education—limited 
•-•ioar remedies even when a weakness has been identified. _J 

The second category of illustrations suggests that fiscal and 
i'oUti.al considerations are at least as Important as evaluation in • 
m>-iivaLin>' change, based on the* fact that they also represent about 
otu-t.uirLh of all the illustra^i^ons. They tend, however, to reflect 
more swt-eping changes. Four examples are: 

The ni.ith program came about because we had carryover 
tunds accumulating and felt a need for a math program. 

(Principal) 



ERIC 



33 



Most of the changes that have taken place have been shifts 

In the location of the program as the number of eligible 
students changes, as the funding increased or decreased, 
etc. (Director) 

. "Aides cost more each year so we have to eliminate some,- 

(Principal) 

The math program was started because thd state suggested 
It. (Director) 



I suspect that budgtftary and political considerations are even more in- 
fluential than the total number of illustrations suggests but would 
ttuid to be mentioned less often, particularly in the context of an inter- 
view directed at local utilization of evaluation. 

The third category, subjective Information, includes over one-third 
ot all the illustrations. Most of these illustrations suggest that 
changes wei'e ba:ied primarily on staff observation of the program. Some ^ 
V examples are: 

I will expand the content area of the reading lab to 

science because of the success 1 have had using social 

studies materials, because science is interesting to the 

students, and because I hope to help them improve their 

work In other classes. (Teacher) 

. 

The iTvith program was expanded with additional personnel 
and diagnostic tests because we saw the need for these 
^ thiu^s; they have enhanced our basic program. . (Teacher) 

\vo use informal evaluation (teacher experience) to modify . 
tlu' lurrii'uLum and use trial and error to find the right 
activities. The bis decisions (e.^., dropping kinder- 
/.arten, food, etc.) are political and administrative. If 
liard data are available and on the rignt side, things are 
easier to sell. (Director) 

t'haiu.es are often based on questionnaires filled out by 
ti'a^-liers and principals and my observation. ■» 

(Administrator) 

Several .-xauiples in this eaLer.ory indicate a major concern with program 
;nanai',eab i I i ty and teaching phiiosophy. As examples: 

Vk- ch.ise .1 new reading series because we felt we needed a 
K'Hs ind ividual ized approach and more direct contact. So 
we had the faculty evaluate several md also visited other 
sv-hools to look at it and the scores. The faculty liked it 
because it gave introductions to stories and luid bnlit-in- 
t.^sLlnv.. (Principal) 

34 

> 

ERIC ^ 



We chose a new nath program because the existing curriculum 

twr. h"" 'Jk^: ^^"'"'^ ^^""^ in kindergarten 

through sixth 6rade. Thus ye looked at materials. Second? 
we looked at whether it would be gffective. We did this 
involving the whole staff, recommendations from the diatr'-t 
and we kn^w we wanted one that didn't rely heavily on read- 
ing and that had built-in teats. ^ (Teacher) 

Finally, there were three examples that suggested use of evaluative 
Information, but not that information reported formally In the evaluation. 
This Is the category loosely termed "objective." The three examples are: 

one school claimed that their self -concept program was great 
. sou measured it and found no gains. This got them to think 
mot^ about what they were doing and what they expected. 

. ■ (Evaluator) 

We have made a major change based on three years of files, 
from problem solving sessions with teachers. We reduced 
record keeping and increased small group activities. . We 
also changed class size based on teachfers' recommendations 
and changed materials distribution and space based on their 
recommendations. ' (Administrator) 

I got interested in unobtrusive measures to assess library 
use. I got a librarian to cooperate and had him checking 
to see if Title I kids were reading as much. They were 
but it tended to be the easier books. So the librarian 
ordered more easy books that would be of iriterest to the 
older kids-stuff that would not embarrass th^. 

(Evaluator) 

In summary, there are so few examples altoget^r of connections 
between program changes and information that lt'''^is risky to generalize 
from them. The fact that so few exist, especially examples in which 
ov.iluation was used, is by far the most important finding. 



H 



V DISCUSSION ' 

At the local leveJ ; Title I evaluations are used primarily to meet 
requirements, to provide feedback, and to bonfirm (when they are pos- 
itive) individual feelings toward the program. The evaluations tend 
not to be used either as a basis for judging the effectiveness of the 
program or as a guide for program decision-making. Sections II - IV 
consist primarily -of the reasons people gave us for not usl,ng evaluation 
reriuirs for judgments and decision-making. To predict the impact of the 
propt>s<'d evarbation system, however, it J.s necessary to go one step be- 
yond the stated reasons and consider underlying explainations for them, 
as wfll as. constraints imposed by the context of Title I programs. on the 
» t)nstrut t Ive use of evaluation. . 

• s 

L0.t>'_rg.rj?tatlo n of the Findings 

The stated reasons for not using evaluations tend to focus on the 
characteristics of the information in the evaluation. Standardized 
.uhU'vt-mtMU test scores, the backbone of Title I evaluations, are viewed 
as Inadequate at best for program judgments and planning. Reasons ex- 
prt'ssed for this view range from the limitations of these tests in 
measuring the attainment of specific skills to the omission of measures 
of other outcomes considered important, such as children's attitudes 
and. parental involvement. .These stated reasons imply that if the type 
of evaluative information reported were changed, use of the information 

« 

would increase. However, a close reading of alT the statements of the 
respondents suggests otherwise. The statements In toto suggest that 
there are unstated explanations for not using evaluation results as well 
as constraints on evaluation use imposed by the structure of the pro- 
grams, botli of which must be addressed directly if use of evaluation in 
program planning is to increase. Merely changing the type of informa- . 
tlon reported is ijisuf f icient in itself. 



37 



Section II Illustrates that even some pf the most common uses of 
evaluation are avoided when the results are negative. Uses of evalua** 
tlon for public relations or for confirming one^s own beliefs occur 
only when the results are posit ive. • Sections III and IV demonstrate 
that- judgments of program effectiveness and decisions about program 
changes rely heavily on personal, subjective information without clear 
expression of what is being assessed and how. On the oth^r ^and, there 
is evidence to suggest that standardized test scores are perceived as 
useful In contexts other thart evaluation, such as in the selection proc- 
ess and for needs assessments. (These uses are referred to only briefly 
in the text%since they were not the focal point of the study.) To- 
gether these findings suggest that there are some deeper explanations for 
tho limited use of evaluations — reasons that go beyond the characteris- 
tics of the outcomes and measures reported — and that, therefore, the 
stated reasons are best understood as a reflection of the underlying 
reasons. Moreover, both levels of explanation must be viewed within • 
the ctmtext of school districts and their Title I programs. I discuss 
first this context and then elaborate on the underlying reasons for lack 
of iise of evaluations. 

Context oi Title I Programs 

Siu'tions III and IV indicate two constraints on the jise of evalua-- 
Hons posed bv oharacterlstlcs of the program and its evaluation. First, 
pri>^rams- tend to be quite stable, thus limiting the universe in which 
ilKni>;es .ire likely to be made, whether based on evaluations or not. 
Sciond, the timing of the evaluation can by itself restrict its poten- 
tial utilitv hv not meshing with the timing of program planning. Since 
I'valu.ition results are generally reported after the planning has oc- 
curred» their use Is at West limited to that of year-old data. 

Si^vt-ral other ct)ns\ralnts imposed by the structure of programs 
vjt^rc observed, \f not stVtvd directly by ' respondents. Perhaps most 
import. iPt is the fact that, in almost every district, there is little 
conutH tiiu) hetween program staff and evaluation staff. This is a func- 
tion i>t the administrative structure of the program in almost every 

38 

4/ ' • 



district. The- person or persons responsible for the administration and 
the content of the program are not those who are responsible for the 
design and conduct of, the evaluation. Additionally, the evaluator. 
particularly when he/she is external to the program staff, reports- 
only to the Title I Director and is usually completely isolated from 
> the program. As one evaluator stated: 

I don't know whether the te^st scores are useful as a basis for 

con ent'^^rfr ^"'^""^ deal with the\ 

content of the program. (Evaluator) 

siylig:''* evaluator expressed distance from the program/y 

•r am not involved with the program or process evaluation. My 
main audience is t;he Education Department of the district and 

(Evaluator) 

There were two districts, in which this gap was Widged, but not without 
considerable effort on the part of an administrator in one and the 
evaluator in the other. In fact, in the latter case, the evaluator wa^ 
attempting to Insert 'the word "Planning" in the title of his office in 
order to communicate " the relationship he is trying to establish between 
program and evaluation.* 

Another dif f iculty 'po.sed by the system is that a Title I program 
c ontains multiple potential audiences 'for evaluation, each of which has 
^liMorenf Information needs. Title I evaluation is frequently discussed 
in terms of meeting Federal, state, and local needs; often overlooked in 
this context, however, Is the fact that each LEA is a complex organ i^a- ■ 
tton Its.lf. with several levels from the director to curriculum super- 
visors or other intermediate administrators to principals and teachers, 
as well as parents. 



Kven In this situation, however, the evaluator felt that a relatlonshio 
bctwe^ni program and evaluation Is impossible to establish if it Is 
instigated only by the evaluator and not supported as well by the pro- 
gram's administrators. ' ^ P 



39 \ 



Finally, there Is. a general constraint on using evaluation that 

* * 

atems not from the specific context of each district but from the state 
of the art in educational treatments. Ideally, evaluation is expected 
to provide evidence on the strengths and weaknesses of programs that can 
In turn guide planners on directions in which their programs can be 
Improved* This ideal presupposes, however, that if a weakness is identi 
fled, there are one or more potential remedies available. The limited 
knowIedi(e on what constitutes a successful strategy in educational * treat 
ments therefore limits the extent to which evaluation can be fully . 
utilized — both from the lack of proven alternatives and from the feeling 
of frustration that this lack pr^^duces. This Is not meant to imply that 
tjijMro • Is a magic solution just aroand the corner, but rather, that educa- 
tion Is a difficult If not Impossible area In whfch to apply fully a 
rational model of evaluation as a guide to declslon^-maklng. . 

The tonstralnts of the system are not necessarily permanent fix- 
tures, but they uo .characterize the current statPe of affairs In the 
districts visited^ and, I suspect. In most others. As such, they not 
only limit uses of evaluation directly, but also strongly affect how 
individuals in the system perceive evaluation. The isolation of the' 
^valuator t rom the program, the relative stability of programs, and the 
timing; of evaluations together contribute to a climate that is not con- 
dui*'ive to .viewing evaluation as a potentially constructive tool. This 
i lim.uc provides an Important perspective for understanding why Indi- 
viduals in the system view evaluation as they do. This view, gleaned 
I torn lov^kiri^i boyond what respondents said. Is described next. 

}^}' h ' ing Attitude s Towa r d Kvaluntion 

Two tarts about the state of mind of local staff suggest strongly 
that, regardless of the type or quality of the evaluation data, the data 
are not likely to be .favorably received and hence not used. The first 
is tlu^ tiarrow and usually negative way in whleh eval uat Ion is" perceived 
and the seeoud is the strong motivation of individuals to protect their 
btisii- hel lets, ^ ' % 

AO 



Put tn the simplest terms by one evaluator: "Evaluation is a 
dirty word." In general, evaluation is. viewed as a set of procedures 
designed to provide one's superiors with i.nformatlon en which to Judge 
the program's success, on the basis of criteria defined b/ the superiors. 
Evaluation is therefore more likely to be associated with tfi^ threat of 
accountability to someone else than with its potential as a useful source 
of Information for one's self. To the'extent that the evaluation ques- 
tions imd criteria for success are imposed externally and that the evalua- 
tion Is conducted primarily to meet externally imposed requirementa, this 
negative view of evaluation is, reinforced by actual experience. Further- 
more, Its threatening nature is exacerbated by the psychological distance 
between ..valuation and program staff. As long as evaluation is viewed 
in this narrow and essentially threatening way, it is doubtful that 'the 
information it contains will- be used, regardless of its characteristics.' 

The sfcond s4:»*e of mind can be characterized as the "true-believer" 
svndnw. It Is common knowledge t^hat an individual deeply committed 
to a parUcular belief is not likely to ^change that belief merely because 
"ob j.ctive" evidence against the belief 'is presented. Politics^and reli- 
gion abound with relevant examples. > This is not to imply that local 
Ti^lt. I staff and parents are akin t\ religious zealots, but they are by 
and l.ir^e strongly committed to their programs. When people Invest their 
■time and energy In a cause they view as worthy, they will seek out and 
readily a.rept evidence that their work has not been in vain. Likewise, ' 
tliev will ignore or explain away Information that suggests they have 
taiM-d. TitU' I staff, particularly those involved daily in implementing 
th.. program, often Invest considerable energy In their .work because they 
vi.w it as Important and worthwhile. Therefore,. It Is not surprising 
that tlu-v interpret evaluation results selectively, accepting the posi- 
tive .„ul r.-i.Tt in^. the negative. As one director said, "We are success- 
tiii K'wn if wc can't show It im paper." 



i'onc I us i on 



Ftom tliis analysis, I conclude that chanr.ing the type or quality 
ot int.. rm.it ion contained in the evaluations will, not, by Itself, affect 



41 



the level of evaluation utilization. But simply changing the nature of 
the Information Is the f<^;.al point of the USOE evaluation, models and the 
primary role of the TACs^ which is to assist in the implementation of the 
models. The models address only the "symptoms, that is, technical weak- 
nesses of the outcome' measures and procedures for data collection and 
analysis # I suggest that this approach — and any approach that focuses 
exrhislvelv on the information contained in the evaluations — ^cannot by 
it:;vlt significantly affect local use of evaluation. Instead, changes in 
the evaluation system designed to increase local utilization must address 
the underlying reasons for lack of use, including individual attitudes 
and beliefs about the program and evaluation. At the same time, such a 
system must address those elements of the context amenable to change 
that reinforee existing negative views toward evaluation. 

Tai'klinki the area of attitude change is obviously far more chal- 
len^^ing than merely changiVig the test or metric, but it is not beyond 
reath. The fact that there are even a few instances of evaluation use 
in program deoisions suggests that increased use of evaluation is pos- 
sible. This fac't, together with an understanding of the impediments to' 
use of data, point to some promising directions for the shaping of a 
I\'deral strategy that can Increase local use of evaluation. 

1 1 i I M t ^ i 1 \n s 1 i u; j^^l^ij-^^ 

ui two vurrent Federal strategies for Title 1 evaluation, an 

:sif!it n.uional study and the three-tiered reporting scheme, only 
t :u" latter i.as the potential to encourage local use of evaluation. Since 
: "^J.'l^^'ihKiu national studies are generally agreed to be the best source 
i»>r providini; t^vidence of the national -impact of Title I, it should be 
pv'^sihle t.^ emphasi/.e local use of evaluation in the three-tiered report- 
\\xf.\ sv,.tem without sacrificing a source for national impact data. There- 
u»re, the implications discussed below 'take the form of recommendations 
t^M* radical Iv changing the emphasis of the three-tiered reporting system 
tv. e?it' t hat ' eni-our*igc: h)i'al use of the evaluation data. 




42 



First of all. any strategy designed to increase local uae of evalua- 
tion must be grounded in a Federal, commitment to this goal-a commitment 
whh-h must be understood and shared by the states and communicated clearly 
to local districts. As long as districts collect data primarily or 
exclusively for state and Federal use, they are unlikely to change their 
vlew.^ toward evaluation. T^is suggests, at the least, that deadlines for 
evaluation reporting should be coordinated with the local planning cycle. 

Second, districts need assistance in increasing communication and 
cooperation between program staff and evaluation staff. Our visits sug- * 
.gest that the provision of feedback can be uied as one way to facilitate 
understanding between program and evaluation staff. However, the infor- 
mation ted back must be designed in a way that makes it clearly under- 
standable to staff and parents and must address the different needs of 
different levels within a district. For example,' a curriculum super- 
visor overseeing ^a program in six schools views the program from a dif- 
ferent perspective and has information needs different from those of a 
classroom teacher. Additionally, the findings suggest that results 
should be presented in person if they are to be clearly understood and 
lience utilized by staff. 

Finally, Title I staff and parents need assistance in developing 
an undtTstanding of the constructive role that evaluation can play and 
In acquiring certain types of nontechnical evaluation skills. Incorporat- 
ing evaluation information into planning and decision making is not an 
automatic process, yet it is one in which- local staff have received little- 
It any training. In particular, they need assistance xri learning how to 
ask their own e^aluktiun questions. If the .primary purpose of evaluation 
remains that of answering questions imposed externally, the evaluation 
will continue to be potentially more threatening than helpful. if, on 
the other hand, the evaluation responds to questions about program effec- 
tiveness that the staff have expressed interest in, the potential for 
using the results should increase dramatically. Until Title' I staff and 
parents come to see Title I as a program to be improved continually based 
in part on evaluation, the evaluation results,, even if technically sound, 
will fall. on deaf ears. ■ " 

A3 



Tlie areas described above are limited to the ones that I feel are 
amenable to change through Federal policy and the provision of tjechnical 
assistance* However^ they do provide a starting point for designing an 
evaluation strategy whose primary aim is to provide local staff and 
parents with information of use to them in improving their programs. 
AddltUmally, the' more of these issues that are addressed concurrently by 
an evaluation strategy » the greater the likelihood of success. Treating 
each vause of lack of utilization separately is less likely to affect 
basiv attitudes toward evaluation than treating as many as possible con- 
current ly • 

Until local staff view evaluation In a positive lights effort devoted 
exclusively to the development of technically sound data will be wasted. 
In the absence of use of evaluation information^ it is impossible to 
determine the extent to which the types of measures employed facilitate 
or impede use of the results. This is not to imply that the issue of 
measured should be ignored. Use of information is determined jointly by 
the characteristics of the information and the characteristics of poten- 
tial users* Furthermore, the characteristics of the Information can, in 
theory, affect the attitudes of the audience. From the comments of 
respvMidents, 1 suspect that measures other than standardized achievement 
tests should be ini'luded in the evaluation, at the least. Given the 
iurrent state of affairs, however, the issue of outcome measures is far 
Ifss inipv>rt,mt than that of redesigning the evaluation strategy to en- 
i vKua^e lo^^vil use through addressing the impediments discussed above. 

The current technical assistance strategy, i f redirected, can serve 
.1'^ A pvUvt-rMul force in changing how evaluation is perceived and thereby 
inrre.ist* evaluation use at the local level • To accomplish this goal, 
hvUvrver, ttM'hnical assistance must be redesigned to communicate a new 
vi.t'w the rolo c^f evaluation and to develop skills such as generating 
.»ne's own evaluation questions. As long as technical assistance is 
vletined narrowlv as a way of telling local staff **how to improve the 
qualit-v of their data,** it will not affect local use of evaluation. 



44 



As a final note» 1 would like to add a context for these findings 
that extends beyond the education community. The failure to use Informa- 
.Ion as rational models would predict is the rule rather than the excep- . 
tion amoHK decisionmakers in every wall^ of life that has been Investigated. 
Thus, the portrait of educators as irrational that might be drawn from 
this report .ould equally well describe their counterparts ^ong lawyers,-- 
Physu lans, or policymakers in general. *The point is not to decfy . ■ 
irrationality. bv*t rather to direct resources toward activities thai have 
the potential to Increase the rational component of decisions. 



43 



• 

« 



ERIC ' 



RKFERKWCKS 



C.ol.on, D.K. and M. (JartJL , /'Reforming Educational Policy .with Applied 
Svu-x^al Ros^.h," Harvard Ed"^a t ional Review . Volume 45, No. 1, pp. 17-43 
( 1 *i 7 5 ) . ^ 



FrapU..]'. r.. Cy_ntryvx'r_s±us_iind Dec ision s (Russell Sage Foundation, New York. 

1 y / h ) . 

i;Ai), "I'rvibU'ms and Needed Improvements in Evaluating Office of Education 
r ProKrams," Re[ . , t to the Congress, General Accounting Office", Comptroller 
Ufueral of the United States (Septemeber 1977). 

I.a ur.h 1 i n , M . W . , ' Eva I uat ipn_an.c^ Reform : Th e El ementary and Se\:onda ry 
J.hu'ation vV. t ujLi975, Title I (liallinger, Cambridge, Massachusetts,' 1975) . 

KMC, "Furthfr Documen-tat ion of State ESEA Title I Reporting Models and' 
Ihelr- iechnual Assistance Requirements, Phase 1," Report to the Office 
ot KdiKaLi..a-, RMC Research Corporation, Mountain View, California (1976). 

LSCAN, "l.e^islative History of P.L. 93-380: Education Amendments of 1974," 
r. S. ,c:.in>'.ress iona 1 and Admin istrat ive. News (1974). 

'.v-iss, c, rsuiK Social Research in Public Policy Making (D.C. Heath, 
l.ox in^^Mi, Massachusetts, 197 7). 



-4 / 



